Tags: deep learning*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. AlexNet, a groundbreaking neural network developed in 2012 by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, has been released in source code form by the Computer History Museum in collaboration with Google. This model significantly advanced the field of AI by demonstrating a massive leap in image recognition capabilities.

  2. This study demonstrates that neural activity in the human brain aligns linearly with the internal contextual embeddings of speech and language within large language models (LLMs) as they process everyday conversations.

  3. ByteDance Research has released DAPO (Dynamic Sampling Policy Optimization), an open-source reinforcement learning system for LLMs, aiming to improve reasoning abilities and address reproducibility issues. DAPO includes innovations like Clip-Higher, Dynamic Sampling, Token-level Policy Gradient Loss, and Overlong Reward Shaping, achieving a score of 50 on the AIME 2024 benchmark with the Qwen2.5-32B model.

  4. The attention mechanism in Large Language Models (LLMs) helps derive the meaning of a word from its context. This involves encoding words as multi-dimensional vectors, calculating query and key vectors, and using attention weights to adjust the embedding based on contextual relevance.

  5. AAAI survey finds that most respondents are sceptical that the technology underpinning large-language models is sufficient for artificial general intelligence.

    "More than three-quarters of respondents said that enlarging current AI systems ― an approach that has been hugely successful in enhancing their performance over the past few years ― is unlikely to lead to what is known as artificial general intelligence (AGI). An even higher proportion said that neural networks, the fundamental technology behind generative AI, alone probably cannot match or surpass human intelligence. And the very pursuit of these capabilities also provokes scepticism: less than one-quarter of respondents said that achieving AGI should be the core mission of the AI research community.

    2025-03-05 Tags: , , , , , by klotz
  6. This article explores the application of reinforcement learning (RL) to Partial Differential Equations (PDEs), highlighting the complexity and challenges involved in controlling systems described by PDEs compared to Ordinary Differential Equations (ODEs). It discusses various approaches, including genetic programming and neural network-based methods, and presents experimental results on controlling PDE systems like the diffusion equation and Kuramoto–Sivashinsky equation. The author emphasizes the potential of machine learning to improve understanding and control of PDE systems, which have wide-ranging applications in fields like fluid dynamics, thermodynamics, and engineering.

  7. The article delves into how large language models (LLMs) store facts, focusing on the role of multi-layer perceptrons (MLPs) in this process. It explains the mechanics of MLPs, including matrix multiplication, bias addition, and the Rectified Linear Unit (ReLU) function, using the example of encoding the fact that Michael Jordan plays basketball. The article also discusses the concept of superposition, which allows models to store a vast number of features by utilizing nearly perpendicular directions in high-dimensional spaces.

  8. The article explores the architectural changes that enable DeepSeek's models to perform well with fewer resources, focusing on Multi-Head Latent Attention (MLA). It discusses the evolution of attention mechanisms, from Bahdanau to Transformer's Multi-Head Attention (MHA), and introduces Grouped-Query Attention (GQA) as a solution to MHA's memory inefficiencies. The article highlights DeepSeek's competitive performance despite lower reported training costs.

  9. Scaling Reinforcement Learning (RL) to surpass O1 in deep learning models

  10. The article introduces a new approach to language modeling called test-time scaling, which enhances performance by utilizing additional compute resources during testing. The authors present a method involving a curated dataset and a technique called budget forcing to control compute usage, allowing models to double-check answers and improve reasoning. The approach is demonstrated with the Qwen2.5-32B-Instruct language model, showing significant improvements on competition math questions.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "deep learning"

About - Propulsed by SemanticScuttle